Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] LLVM/MHLO uplift 09/04/2023 #2484

Closed
wants to merge 7 commits into from
Closed

Conversation

jcwchen
Copy link
Member

@jcwchen jcwchen commented Sep 6, 2023

LLVM/MHLO uplift 09/04/2023.

Bump versions as below:

  • Green LLVM commit: 6098d7d5f6533edb1b873107ddc1acde23b9235b
  • Green MHLO commit: 5a4c066815ba4358919ced9539e899dfd8dee287

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jcwchen jcwchen marked this pull request as draft September 6, 2023 16:29
@sorenlassen
Copy link
Member

@jenkins-droid test this please

@sorenlassen
Copy link
Member

@gongsu832 can you please see if you can make jenkins run on this MR?

@gongsu832
Copy link
Collaborator

@jenkins-droid test this please

@gongsu832
Copy link
Collaborator

@sorenlassen what browser you are using? Your browser is sending extra chars at the end of the message:

"body": "@jenkins-droid test this please\r\n",

Although it really shouldn't matter since the regex is @jenkins-droid test this please.* but I'm wondering maybe it's triggering some bug in the Generic Webhook Trigger plugin's pattern matching logic. Maybe you can try a different browser?

@sorenlassen
Copy link
Member

sorenlassen commented Sep 6, 2023

@sorenlassen what browser you are using?

Chrome

Your browser is sending extra chars at the end of the message:

I might have added a newline in the message, I can try to make sure not to do that, and I can also try a different browser - I can try that next time @jcwchen updates this MR

Signed-off-by: jcwchen <[email protected]>
Signed-off-by: jcwchen <[email protected]>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jcwchen
Copy link
Member Author

jcwchen commented Sep 7, 2023

Hi -- this is my first time to bump LLVM/MHLO commit in onnx-mlir and therefore I might need the community's help/guidance. After the updates, although cmake --build . for onnx-mlir is fine, there are quite a few broken tests when running cmake --build . --target check-onnx-lit, mostly under test/mlir/xxx/xxx.mlir. I have tried running https://github.com/onnx/onnx-mlir/blob/main/utils/fixLitTest.py, but it still throws an error without fixing the error automatically.

Might be a naive question: Shall I focus on fixing the test itself or try to fix the implementation logic instead? Any pointer/suggestion is greatly appreciated. Thank you for looking into it!

@sorenlassen
Copy link
Member

@jenkins-droid test this please

@sorenlassen
Copy link
Member

@gongsu832 fyi, the test-me-please that I posted a moment ago was entered with Safari, with no newline at the end

@sorenlassen
Copy link
Member

Shall I focus on fixing the test itself or try to fix the implementation logic instead?

it's ideal if you can take a close look at the failures and find the root causes (probably just one or two root causes that make several tests fail) and then make a judgement call whether to modify the tests or fix what made the tests fail, then we can discuss in the code review if we agree on your solutions

if you can't come up with a solution, or can't decide which solution to go with, then please share your analysis of the problem in the comments and then folks can chime in to help

Signed-off-by: jcwchen <[email protected]>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

Signed-off-by: jcwchen <[email protected]>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

Signed-off-by: jcwchen <[email protected]>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

Signed-off-by: jcwchen <[email protected]>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jcwchen
Copy link
Member Author

jcwchen commented Sep 8, 2023

Updates:
With bumping LLVM commit in onnx-mlir, although the build is fine, 100 tests under test/mlir/xxx/xxx.mlir failed. After several experiments, I still cannot find a general way to fix them globally (also tried replacement with regular regression in all files, but no luck). Now I am planning to go through the failed 100 tests one by one and manually identify whether the target line needs to be changed.

The most issue for missing "<>" is as below:

%0 = "onnx.MaxPoolSingleOut"(%arg0) {auto_pad = "VALID", ceil_mode = 0 : si64, kernel_shape = [3,3]}

needs to be added "<" and ">" for attributes "{}" as

%0 = "onnx.MaxPoolSingleOut"(%arg0) <{auto_pad = "VALID", ceil_mode = 0 : si64, kernel_shape = [3,3]}>

Other than that, I found some tests became crash after LLVM bump and I haven't figured out why. For instance, test\mlir\conversion\onnx_to_krnl\Additional\Custom_with_canonicalize.mlir, test\mlir\conversion\onnx_to_krnl\Additional\ShapeTransform_with_canonicalize.mlir, test\mlir\conversion\onnx_to_krnl\ControlFlow\If_with_canonicalize.mlir... and etc.

Now I have successfully fixed 10 failed tests by hand, but it will take quite a long time to fix every of them (90 failed tests left...). I am not sure whether I am on the right track. Therefore, if anyone has better clue to fix these test failures, please let me know. Appreciate it!

@philass
Copy link
Member

philass commented Sep 11, 2023

Thanks for the work so far @jcwchen!

We have our weekly onnx-mlir meeting tomorrow. @AlexandreEichenberger has a magical script that can often repair these lit tests. I'll try and get his input on if there is a special set of flags which can fix these lit tests.

I'll also take a look at crashing llvm tests.

@jcwchen
Copy link
Member Author

jcwchen commented Sep 11, 2023

@philass Thank you so much for the help!

@AlexandreEichenberger has a magical script that can often repair these lit tests. I'll try and get his input on if there is a special set of flags which can fix these lit tests.

I guess you are talking about onnx/onnx-mlir/blob/main/utils/fixLitTest.py? I have tried it for a while last week, but it seems that it can only auto fix few failed tests. Not sure whether I used it in the right way tho. Perhaps there is something needs to be updated in that script according to the new LLVM bump.

@AlexandreEichenberger
Copy link
Collaborator

Not sure about the magical part, but yes I have a script that can re-generate the lit tests. It has been used mostly for some passes and can be a bit fragile. The gist is:

fixLitTest.py ../test/mlir/conversion/onnx_to_krnl/NN/Normalization.mlir 

to go though each test individually and eventually list the erroneous files. Adding the -r flag will print on stdout the original mlir test files for all the successfully passed tests, and a repaired version test for the ones that were broken.

Unfortunately, I know of tests that break the "repair" script; in addition, the fixLitTest.py (located in utils) requires functions to be separated by // ----- patterns that sometimes are missing.

If the issue is just the < and >, have you tried a simple python / awk / sed program that scans for maybe ".onnx.{.*}" and perform the replacement? Python re.match might be the most full-featured, possibly sed is enough.

@jcwchen
Copy link
Member Author

jcwchen commented Sep 11, 2023

Thanks @AlexandreEichenberger for the details of fixLitTest.py.

Unfortunately, I know of tests that break the "repair" script; in addition, the fixLitTest.py (located in utils) requires functions to be separated by // ----- patterns that sometimes are missing.

+1. I bumped into that kind of error while using fixLitTest.py on some failed tests.

If the issue is just the < and >, have you tried a simple python / awk / sed program that scans for maybe ".onnx.{.*}" and perform the replacement? Python re.match might be the most full-featured, possibly sed is enough.

Yes, at least I have tried regular expression (as mentioned here) and it will involve many irrelevant changes somehow (actually ".onnx.{.*} is not the only format). However, honestly I am not very familiar with these kinds of tools and so I will spend more time trying them to see whether the syntax issue can be solved globally.

@AlexandreEichenberger
Copy link
Collaborator

If you email me a file to fix, with the pattern, I can try to whip up a script. I would rather not update my working copy unless I really have to.

@jcwchen
Copy link
Member Author

jcwchen commented Sep 11, 2023

If you email me a file to fix, with the pattern, I can try to whip up a script. I would rather not update my working copy unless I really have to.

I will provide some quick context first: some of files in this PR might helpful and also you can preview the difference in https://github.com/onnx/onnx-mlir/pull/2484/files -- For instance,

  • test/mlir/conversion/instrument/add.mlir
  • test/mlir/conversion/instrument/onnx_add.mlir
  • test/mlir/conversion/instrument/onnx_all.mlir
  • test/mlir/krnl/krnl_global_elision.mlir
  • test/mlir/onnx/onnx_shape_inference_maxpool.mlir

Basically the pattern are ".onnx.{.}" and ".krnl.{.}". If these are not sufficient, I can email you more. Thank you for looking into it!

@AlexandreEichenberger
Copy link
Collaborator

AlexandreEichenberger commented Sep 12, 2023

I have not enough info to know really which pattern needs to be changed. But if you want to change every {.*} patterns in <{.*}>, then the following script would work

sed "s/{\([^}]*\)}/<{\1}>/g" onnx_add.mlir > bibi.mlir

The big hope here is that there are no nested {} patterns. Please let me know if you want a pattern that only apply for onnx or krnl and I can certainly adapt the pattern.

Maybe a realistic goal is to use a simple script here to fix most of the cases, and manually fix the remaining exceptions.

@jcwchen
Copy link
Member Author

jcwchen commented Sep 12, 2023

The big hope here is that there are no nested {} patterns. Please let me know if you want a pattern that only apply for onnx or krnl and I can certainly adapt the pattern.

Maybe a realistic goal is to use a simple script here to fix most of the cases, and manually fix the remaining exceptions.

I came up with a similar regular expression as yours, but unfortunately there are indeed a few nested {} patterns. Applying the replacement globally still failed most of failed tests. I agreed probably I still need to go through them manually -- At least I have the generic regular expression to find the potential matches and fix them one by one by hand, although there are quite many (100 failed files and each of them have multiple matched lines). Thank you for trying it out and the suggestion!

@philass
Copy link
Member

philass commented Sep 12, 2023

@jcwchen I can take this on

@AlexandreEichenberger
Copy link
Collaborator

@jcwchen I can take this on

Thanks @philass.

One clunky way to handle recursive patterns is to detect this: {.*{[^{}]*}.*, namely a contained {} inside of another, then change the inner "{}" to something that is not found, say ### for { and @@@ for '}', and do that as many times as needed until there are no changes (or say 20 times). Then you apply the rule to the outermost (like the sed mentioned before). Then you revert back into one swoop all the ### and @@@ back to { and }.

@jcwchen
Copy link
Member Author

jcwchen commented Sep 13, 2023

@philass thank you so much for taking over! Sorry that I cannot be more helpful (first time doing the version bump and quite new to onnx-mlir). Since the update work this time seems quite large to me (fixing 100 test failures), please let me know if you want to decouple the bump work and I am happy to take some.

@philass
Copy link
Member

philass commented Sep 13, 2023

@jcwchen this one is particularly painful. Some MLIR tribal knowledge has already come in handy.

Setting let usePropertiesForAttributes = 0; got me down to 40 lit tests failing. So a lot of the failures in the lit tests were outside of the regex issues.

Testing Time: 3.68s
  Unsupported: 115
  Passed     : 160
  Failed     :  40

I was lucky in that I had already read this discussion https://discourse.llvm.org/t/rfc-enabling-properties-for-attribute-storage-by-default/72900/4. So when I saw properties in the backtrace of my lldb session, my spidy senses started tingling.

And in general, thanks for all the work you do in the ONNX ecosystem!

@philass
Copy link
Member

philass commented Sep 13, 2023

@philass thank you so much for taking over! Sorry that I cannot be more helpful (first time doing the version bump and quite new to onnx-mlir). Since the update work this time seems quite large to me (fixing 100 test failures), please let me know if you want to decouple the bump work and I am happy to take some.

It's all good. I'll take it from here. I'll probably make a new PR that builds on top of the work you already did in your branch.

@philass
Copy link
Member

philass commented Sep 14, 2023

I'm closing this as we landed #2502

@philass philass closed this Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants